An HMM-based Farsi OCR

نویسنده

  • Vahab Pournaghshband
چکیده

OCR (Optical Character Recognition) is the digital encoding of printed and handwritten characters from an image file created through a scanner or other optical imaging devices. In other words, OCR is a software program that converts image-texts into computerized or digital text (figure 1) . While OCR has been extensively used as the basic application of different learning methods in machine learning literature for a few decades now, few researches have been done specifically for the language Farsi which is mainly due to cursive nature of Farsi scripts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-Font Farsi/Arabic Isolated Character Recognition Using Chain Codes

Nowadays, OCR systems have got several applications and are increasingly employed in daily life. Much research has been done regarding the identification of Latin, Japanese, and Chinese characters. However, very little investigation has been performed regarding Farsi/Arabic characters recognition. Probably the reason is difficulty and complexity of those characters identification compared to th...

متن کامل

A Comprehensive Isolated Farsi/Arabic Character Database for Handwritten OCR Research

This paper presents a new comprehensive database for isolated offline handwritten Farsi/Arabic numbers and characters for use in optical character recognition research. The database is freely available for academic use. So far no such a freely database in Farsi language is available. Grayscale images of 52,380 characters and 17,740 numerals are included. Each image was scanned from Iranian scho...

متن کامل

Unconstrained Farsi handwritten word recognition using fuzzy vector quantization and hidden Markov models

An unconstrained Farsi handwritten word recognition system based on fuzzy vector quantization (FVQ) and hidden Markov model (HMM) for reading city names in postal addresses is presented. Preprocessing techniques including binarization, noise removal, slope correction and baseline estimation are described. Each word image is represented by its contour information. The histogram of chain code slo...

متن کامل

Farsi Handwritten Word Recognition Using Discrete HMM and Self- Organizing Feature Map

A holistic system for the recognition of handwritten Farsi/Arabic words using right-left discrete hidden Markov models (HMM) and Kohonen self-organizing vector quantization(SOFM/VQ) for reading city names in postal addresses is presented. Pre-processing techniques including binarization, noise removal and besieged in a circumferential rectangular are described. Each word image is scanned form r...

متن کامل

An HMM-Based Legal Amount Field OCR System for Checks

The system described in this paper applies Hidden Markov technology to the task of recognizing the handwritten legal amount on personal checks. We argue that the most significant source of error in handwriting recognition is the segmentation process. In traditional handwriting OCR systems, recognition is performed at the character level, using the output of an independent segmentation step. Usi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007